(original by Dario Izzo - extended by Ekin Ozturk)
In this notebook we show the use of desolver
and vectorised gduals for the numerical integration of multiple initial conditions following the notebook example here differential intelligence.
In [1]:
%matplotlib inline
from matplotlib import pyplot as plt
import os
import numpy as np
os.environ['DES_BACKEND'] = 'numpy'
import desolver as de
import desolver.backend as D
D.set_float_fmt('gdual_vdouble')
Take as an example the task of learning a robotic controller. In neuro evolution (or Evolutionary Robotics), the controller is typically represented by a neural network, but for the purpose of explaining this new learning concept we will use a polinomial representation for the controller. Later, changing the controller into a NN with weights as parameters will not change the essence of what is done here.
In [2]:
# Definition of the controller in terms of some weights parameters
def u(state, weights):
x,v = state
a,b,c,e,f,g = weights
return a + b*x + c*v + e*x*v + f*x**2 + g*v**2
In [3]:
# Definition of the equation of motion (our physics simulator propagating the system to its next state)
def eom(state, weights):
x,v = state
dx = v
dv = u(state, weights)
return (dx, dv)
In Evolutionary Robotics, Euler propagators are commonly used, but we would like to use a higher order integration scheme that is adaptive in order to minimise computation, and increase the accuracy and precision of the results.
We are using gdual_vdouble
in order to integrate multiple initial states simultaneously without any significant loss of computation time due to the very efficient vectorisation of gdual_vdouble
computations.
In [4]:
num_different_integrations = 16
weights = D.array([D.gdual_vdouble([0.2*(np.random.uniform()-0.5)]*num_different_integrations, _, 4) for _ in "abcefg"])
x = [D.gdual_vdouble([2.*(np.random.uniform()-0.5) for i in range(num_different_integrations)])]
v = [D.gdual_vdouble([2.*(np.random.uniform()-0.5) for i in range(num_different_integrations)])]
y0 = D.array(x + v, dtype=D.gdual_vdouble)
In [5]:
def rhs(t, state, weights, **kwargs):
return D.array(eom(state, weights))
We integrate the system using the Runge-Kutta-Cash-Karp scheme as the numerical integration system with a dense output computed using a piecewise C1 Hermite interpolating spline.
This particular interpolator is used as it satisfies not only the state boundary conditions, but also the gradients and is well suited for approximating the solution continuously up to first order in time.
Note that the entire integration is done using gduals and thus the solution and the interpolating spline stored in the OdeSystem instance, pyaudi_integration
, contains all the information about how the state reached by our robot changes when we change the control weights.
In [6]:
# We restrict the integration time due to the fact that the controller
# is quadratic in the state and causes the state to grow very rapidly.
pyaudi_integration = de.OdeSystem(rhs, y0=y0, dense_output=True, t=(0, 3.), dt=0.1, rtol=1e-12, atol=1e-12, constants=dict(weights=weights))
pyaudi_integration.set_method("RK45")
pyaudi_integration.integrate(eta=True)
x,v = pyaudi_integration.sol(D.linspace(0, 1, 20)).T
We numerically integrate 16 initial states and see the paths they follow based on the controller we have defined.
In [7]:
for _x, _v in zip(D.to_float(x).T, D.to_float(v).T):
plt.plot(_x, _v, 'k')
plt.plot(x[0].constant_cf, v[0].constant_cf, 'ro')
plt.show()
In [8]:
xf, vf = x[-1], v[-1]
In [9]:
print("initial xf: {}".format(xf.constant_cf))
print("initial vf: {}".format(vf.constant_cf))
We have represented all the robot behavior (x, v) as a polynomial function of the weights. So we now know what happens to the behaviour if we change the weights!! Lets see … we only consider the final state, but the same can be done for all states before.
Furthermore, we can compute this for all the different initial states that we integrated thus finding out how the final state of the robot changes for multiple initial conditions.
In [10]:
dweights = dict({'da': -0.002, 'db': 0.003, 'dc': -0.02, 'de': 0.03, 'df': 0.02, 'dg': -0.01})
#Lets predict the new final position of our 'robot' if we change his controller as defined above
print("new xf: {}".format(xf.evaluate(dweights)))
print("new vf: {}".format(vf.evaluate(dweights)))
We now simulate again our behavior using the new weights to see where we end up to check if the prediction made after our differential learning is correct.
In [11]:
new_weights = D.array([it + dweights['d' + it.symbol_set[0]] for it in weights])
In [12]:
pyaudi_integration2 = de.OdeSystem(rhs, y0=y0, dense_output=True, t=(pyaudi_integration.t[0], pyaudi_integration.t[-1]), dt=0.1, rtol=1e-12, atol=1e-12, constants=dict(weights=new_weights))
pyaudi_integration2.set_method("RK45")
pyaudi_integration2.integrate(eta=True)
In [13]:
plt.figure(figsize=(16,16))
x2, v2 = pyaudi_integration2.sol(D.linspace(0, 1, 20)).T
for idx, (_x, _v) in enumerate(zip(D.to_float(x).T, D.to_float(v).T)):
if idx == 0:
plt.plot(_x,_v,'C0',label='original')
else:
plt.plot(_x,_v,'C0')
for idx, (_x, _v) in enumerate(zip(D.to_float(x2).T, D.to_float(v2).T)):
if idx == 0:
plt.plot(_x,_v,'C1',label='simulation')
else:
plt.plot(_x,_v,'C1')
for idx, (_x, _v) in enumerate(zip(D.array([it.evaluate(dweights) for it in x]).T,D.array([it.evaluate(dweights) for it in v]).T)):
if idx == 0:
plt.plot(_x,_v,'C2+', markersize=8.,label='differential learning')
else:
plt.plot(_x,_v,'C2+', markersize=8.)
# plt.plot(x[0].constant_cf, v[0].constant_cf, 'ro')
plt.legend(loc=2)
plt.show()
Since we have integrated multiple trajectories, printing all the final states and comparing it to the evaluated polynomial is visually difficult to parse. Instead, we look at the maximum and mean absolute differences between the numerical integration and the map evaluated with the new weights.
In [14]:
print("Maximum Absolute Difference xf:\t{}".format(D.max(D.abs(D.to_float(x2[-1]) - D.to_float(x[-1].evaluate(dweights))))))
print("Mean Absolute Difference xf: \t{}".format(D.mean(D.abs(D.to_float(x2[-1]) - D.to_float(x[-1].evaluate(dweights))))))
print()
print("Maximum Absolute Difference vf:\t{}".format(D.max(D.abs(D.to_float(v2[-1]) - D.to_float(v[-1].evaluate(dweights))))))
print("Mean Absolute Difference vf: \t{}".format(D.mean(D.abs(D.to_float(v2[-1]) - D.to_float(v[-1].evaluate(dweights))))))